Evaluating sensitivity to classification uncertainty in latent subgroup effect analyses

Background Increasing attention is being given to assessing treatment effect heterogeneity among individuals belonging to qualitatively different latent subgroups. Inference routinely proceeds by first partitioning the individuals into subgroups, then estimating the subgroup-specific average treatment effects. However, because the subgroups are only latently associated with the observed variables, the actual individual subgroup memberships are rarely known with certainty in practice and thus have to be imputed. Ignoring the uncertainty in the imputed memberships precludes misclassification errors, potentially leading to biased results and incorrect conclusions. Methods We propose a strategy for assessing the sensitivity of inference to classification uncertainty when using such classify-analyze approaches for subgroup effect analyses. We exploit each individual’s typically nonzero predictive or posterior subgroup membership probabilities to gauge the stability of the resultant subgroup-specific average causal effects estimates over different, carefully selected subsets of the individuals. Because the membership probabilities are subject to sampling variability, we propose Monte Carlo confidence intervals that explicitly acknowledge the imprecision in the estimated subgroup memberships via perturbations using a parametric bootstrap. The proposal is widely applicable and avoids stringent causal or structural assumptions that existing bias-adjustment or bias-correction methods rely on. Results Using two different publicly available real-world datasets, we illustrate how the proposed strategy supplements existing latent subgroup effect analyses to shed light on the potential impact of classification uncertainty on inference. First, individuals are partitioned into latent subgroups based on their medical and health history. Then within each fixed latent subgroup, the average treatment effect is assessed using an augmented inverse propensity score weighted estimator. Finally, utilizing the proposed sensitivity analysis reveals different subgroup-specific effects that are mostly insensitive to potential misclassification. Conclusions Our proposed sensitivity analysis is straightforward to implement, provides both graphical and numerical summaries, and readily permits assessing the sensitivity of any machine learning-based causal effect estimator to classification uncertainty. We recommend making such sensitivity analyses more routine in latent subgroup effect analyses. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01720-8.

one describes class-specific cell probabilities in multi-way contingency tables enumerating all possible combinations of individual responses to the categorical variables, and another describes ("prior" mixing) probabilities representing the population distribution of the classes.
To obtain the perturbed class membership probabilities, we propose sampling the parameters using a parametric bootstrap approach [Dias and Vermunt, 2006] as follows. We describe the steps using the poLCA package [Linzer et al., 2011] for fitting latent class models.
a. Fit the latent class model to the observed data.

A.2 Model-based clustering
When the latent subgroups are measured by multiple continuous variables, a (model-based) clustering approach, such as a finite (multivariate) Gaussian mixture model [Fraley and Raftery, 2002] can be employed to characterize partitions with similar distributions of the manifest variables. The latent class model is parameterized by the components of a mixture distribution, such as the class-specific mean and covariance. We propose sampling the parameters using a similar parametric bootstrap approach to obtain the perturbed class membership probabilities. We describe the steps using the mclust package [Scrucca et al., 2016] for model-based clustering with Gaussian finite mixture models.
a. Fit the mixture model to the observed data.

B Monte Carlo Simulation Studies
We conducted Monte Carlo simulation studies to evaluate the operating characteristics of the proposed method. First, we assessed the ability of a constructed trajectory of the subgroupspecific effect estimates to recover the true subgroup-specific effect. Second, we compared the empirical coverages of the confidence intervals merely holding the estimated subgroup memberships fixed with the perturbed intervals that combined the CIs from the parametric bootstrap.
Furthermore, we evaluated the ability of an outcome model-based estimator to correct misclassification biases [Gardner, 2020]. Following Proposition 1 of Gardner [2020], the subgroup-specific average potential outcomes under the true memberships can be recovered from the average observed outcomes under the given subgroup memberships as follows: where E{Y (z)|x, C * }, and E(Y |z, x, C), are both vectors of length |C| with the c-th elements E{Y (z)|x, C * = c}, and E(Y |Z = z, X = x, C = c), respectively. Let P z,x denote the |C| × |C| matrix of classification probabilities, conditional on treatment Z = z and covariates X = x, with the (j, k)-th entry being: where λ k is the (predictive or posterior) probability of belonging to subgroup k ∈ C. Let E{Y (z)|C * = c} = x E{Y (z)|x, C * = c} Pr(X = x|C * = c) denote the average potential outcome marginalized over the distribution of the covariates X within the (true) subgroup c, where The c-th subgroup-specific effect is then τ c = E{Y (1)|C * = c}−E{Y (0)|C * = c}. An estimator is obtained by plugging in an estimated outcome modelm(z, x, C = c) for E(Y |z, x, C = c) given the imputed subgroups.

B.1 Study 1: Latent class analysis
We generated a single population of size N = 10 5 with two latent classes. Exactly Λ × N individuals were assigned to class 1 (C * i = 1), while the remaining (1 − Λ) × N individuals were assigned to class 2 (C * i = 2). The population proportions of individuals in class 1 and 2 were thus Λ and 1 − Λ, respectively; we set Λ = 0.4. The covariates and potential outcomes for each individual in each class c = 1, 2 were then randomly generated as: The actual class-specific average treatment effects could then be calculated simply as the average difference in potential outcomes among individuals within each class; i.e., The dichotomous indicator variables, X di , d = 1, . . . , 6, were used to measure the latent class membership, with π c being the probability of observing X di = 1 among members of class c (whose C * i = c). We set π 1 = 0.75 and π 2 = 0.25 so that the response patterns between the two classes were sufficiently different. We set the class-specific treatment coefficients β 11 = −2.4 and β 12 = 1.8 so that they had opposite signs with a larger magnitude for class 1.
The true class-specific population average treatment effects are displayed in Table 1. Ignoring the possibility of heterogeneous treatment effects would lead to the erroneous conclusion that the treatment only slightly benefitted the population on average when there was a harmful average effect for a subpopulation. Each observed dataset of sample size n = 1000 was then generated as follows. First, exactly Λ × n individuals were randomly sampled (without replacement) from the subpopulation in class 1 (C * i = 1) of size Λ × N , while the remaining (1 − Λ) × n individuals were randomly sampled (without replacement) from the subpopulation in class 2 (C * i = 2) of size (1 − Λ) × N . Next, the observed treatments for the sampled individuals in each class c = 1, 2 were randomly generated as: The observed outcomes were then revealed as We considered two different settings for the roles of X i . First, we restricted X i to only be indicators of the latent classes, by setting β d+1,c = α dc = 0 for d = 1, . . . , 6; c = 1, 2. Hence treatment was effectively randomized, and only the potential outcomes under treatment Y (1) may have differed between the two classes. This setting was used to demonstrate that the outcome-based estimator of Gardner [2020] corrected for misclassification biases. Next, the indicators X i were simultaneously common causes of treatment and outcome. For simplicity, we set the class-specific parameters in the outcome models as: β 21 = β 31 = β 41 = 0.7, β 51 = β 61 = β 71 = 0, and β d+1,2 = 1 − β d+1,1 , d = 1, . . . , 6. Similarly, we set the class-specific parameters in the propensity score models as: α 11 = α 21 = α 31 = 0.7, α 41 = α 51 = α 61 = 0, and α d2 = 1 − α d1 , d = 1, . . . , 6. Therefore, the confounders were non-overlapping covariate subsets between the two classes so that individuals who were misclassified could potentially be subject to inadequate control for confounding.
The individual class memberships were estimated by fitting a latent class model with two classes to the indicators X di , d = 1, . . . , 6, using the poLCA package in R. We calculated the class-specific effect estimates and the 95% CIs either under the true (unknown) individual class memberships or holding the estimated individual class memberships fixed. To avoid conflating the biases due to misclassification with biases due to label switching, we simply compared the estimated and true class memberships in each sample, then relabelled the classes to maximize the similarity between the estimated and true class memberships, and subsequently, the number of individuals who would be correctly classified. For each latent class c ∈ C in turn, we constructed a trajectory of the subgroup-specific effect estimates based on the estimated class membership probabilities following the procedure described in the main text. To account for the uncertainty in the estimated class membership probabilities, we then calculated the class-specific effect estimates and the 95% CIs using 100 perturbed probabilities following the procedure described in the main text. We combined the separate CIs to construct the perturbed CI across all perturbations.

B.2 Study 2: Gaussian mixture distribution
In this study, the latent subgroups were determined by the components of a multivariate Gaussian mixture distribution for the covariates. The data-generating process in this study differed from that of the previous study only in that the covariates (and possibly manifest indicators of the latent subgroups) X i were randomly drawn from a multivariate normal distribution: The (subgroup-specific) mean and covariance are denoted by µ c and Σ c , respectively. For simplicity, we used the same values as study 1 for all parameters in the treatment and outcome data-generating models. The observed datasets were generated using the same procedure. In each observed sample, the individual subgroup memberships were estimated by fitting a multivariate Gaussian finite mixture model with two subgroups (components) to the covariates X i , using the mclust package Scrucca et al. [2016] in R. All other aspects of the simulation study were maintained.

B.3 Results
The results for 1000 replications under each setting are presented below in Tables 1 and 2.  Table 1: Summaries of the constructed trajectories of point estimates for the class-specific effects in the simulation studies. The proportion of individuals in class 1 was Λ. The latent class indicators were either not confounders ("No") or simultaneously confounders ("Yes") of treatment and outcome. The trajectories were constructed using estimated class membership probabilities from either a latent class analysis ("LCA"; study 1) or Gaussian mixture model ("GMM"; study 2).  The proportion of individuals in class 1 was Λ. The latent class indicators were either not confounders ("No") or simultaneously confounders ("Yes") of treatment and outcome. The individual class memberships used to separate the individuals were either the true unknown values, or estimated using either a latent class analysis ("LCA"; study 1) or Gaussian mixture model ("GMM"; study 2) then held fixed, or based on perturbed probabilities that accounted for the uncertainty in the estimates. Either an augmented inverse propensity score weighted (AIPW) estimator or the outcome model-based bias-corrected estimator of Gardner [2020] was utilized within each subgroup. Note. All effect estimates were multiplied by 100 to improve readability. The empirical SE ("ESE") of the effect estimates (i.e., standard deviation around the average over all simulated datasets), and root mean squared error ("RMSE") of the effect estimates (calculated as the square root of the mean of squares of the estimates centered at the true value), are presented. The average proportions of correctly classified individuals across all simulated datasets are stated. A dashed line ("−") indicates a non-applicable setting. The true population class-specific effects are displayed in Table 1. All results were rounded to two decimal places.  or quintile (if the indicator was discretized) that a patient representative of that class would exhibit for that characteristic.